A Hybrid Arabic Text Summarization Technique Based on Text Structure and Topic Identification
نویسندگان
چکیده
We present a hybrid approach to the problem of Arabic text summarization. Our approach focuses on segment extraction and ranking using heuristic methods that assign weighted scores to segments of text. Also, we use a text categorization system and the Arabic WordNet to identify the thematic structure of the input text in order to select the most relevant sentences obtained from the statistical analysis process. We use a tokenizer, a stemmer and other statistical tools borrowed from traditional information retrieval to identify relevant segments in the text. The source document is segmented into its major units (title, paragraphs and lines) and then, text-lines are interpreted to extract relevant segments for inclusion in the summary. The summarization system was tested by 1200 human evaluators, who were each given a copy of a newspaper article and a system-generated summary and asked to classify them as “rejected,” ”not-related,” “satisfactory,” “good,” or “accepted.” 76.92% of the summaries were judged to be “good” or “accepted” and 92.34% were judged to be “satisfactory,” or “good,” or “accepted.” These results confirm the viability of using this hybrid approach to tackle the problem of Arabic text summarization.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملEXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملSystematic literature review of fuzzy logic based text summarization
Information Overloadrq is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Proc. Oriental Lang.
دوره 23 شماره
صفحات -
تاریخ انتشار 2011